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Abstract. This paper considers multiprocessor task scheduling in a 
multistage hybrid flow-shop environment. The problem even in its sim- 
plest form is NP-hard in the strong sense. The great deal of interest for 
this problem, besides its theoretical complexity, is animated by needs of 
various manufacturing and computing systems. We propose a new ap- 
proach based on limited discrepancy search to solve the problem. Our 
method is tested with reference to a proposed lower bound as well as the 
best-known solutions in literature. Computational results show that the 
developed approach is efficient in particular for large-size problems. 

Keywords: Hybrid flow shop scheduling. Multiprocessor tasks. Discrep- 
ancy search 



1 Introduction 

Flow shop scheduling refers to a manufacturing facility in which all jobs visit 
the production machines in the same order. In hybrid flow shop scheduUng, the 
jobs serially traverse stages following the same production route, and must be 
assigned to one of the parallel machines composing each stage. The hybrid flow 
shop scheduling problem with multiprocessor tasks is itself a generalization of 
the hybrid flow shop problem, allowing tasks to be processed on more than one 
processor in a given stage, at a time. It can also be viewed as a specific case of 
the resource-constrained project scheduling problem (RCPSP). 

Many applications of hybrid scheduling problems with multiprocessor tasks 
can be found in various manufacturing systems {e.g., work- force assignment in 
[5], transportation problem with recirculation in [1]), as well as in some computer 
systems {e.g., real-time machine- vision 

Hybrid flow shop scheduling problem with multiprocessor tasks has received 
considerable attention from researchers and has been solved by various ap- 
proaches, e.g. genetic algorithms [2], tabu search, and ant colony system [TO] . 



Motivated by the success of discrepancy search for solving shop scheduhng prob- 
lems, in particular hybrid flow shop [2], [3], we propose in this paper a new ap- 
proach based on discrepancy search to solve the hybrid flow shop problem with 
multiprocessor tasks. 

2 Problem Definition 

The hybrid flow shop scheduling problem with multiprocessor tasks can be for- 
mally described as follows: A set J={f , 2, . . . ,n} of n jobs, have to be processed 
in m stages. Hence, a job is a sequence of m tasks (one task for each stage). Each 
stage i = {1,2, ... , m} consists of rrii identical parallel processors. In a stage i, 
the job j requires simultaneously sizcij processors. That is, sizcij processors 
selected at stage j are required for processing job j for a period of time equal to 
the processing time requirement of job j at stage i, namely pij. The objective is 
to minimize the makespan (Cmax), that is, the completion time of all tasks in the 
last stage. According to the classical 3-field notation in production scheduling, 
the problem is denoted by Fm(mi,. . .,mm)\ 

3 Discrepancy Search 

3.1 General Statement 

Limited discrepancy search (LDS) was introduced in 1995 by Harvey and Gins- 
berg [5] . This seminal method can be considered as an alternative to the branch- 
and-bound procedure, backtracking techniques, and iterative sampling. From 
an optimization view-point this technique is similar to variable neighbourhood 
search. Indeed, it starts from an initial global instantiation suggested by a given 
heuristic and successively explores branches with increasing discrepancies from 
it, in order to obtain a solution (in a satisfaction context), or a solution of bet- 
ter performance (in an optimization context). A discrepancy is associated with 
any decision point in a search tree where the choice goes against the heuristic. 
For convenience, in a tree-like representation the heuristic choices are associ- 
ated with left branches while right branches are considered as discrepancies. 
Since LDS proposition in 1995, several variants were suggested, among them. 
Improved Limited Discrepancy Search (ILDS) pj]. Depth-bounded Discrepancy 
Search (DDS) [H], Discrepancy-Bounded Depth First Search [T] and Climbing 
Discrepancy Search (CDS) [T^ . 

In the following sections, we focus on those methods that inspired our ap- 
proach, in particular DDS and CDS. 

3.2 Depth-bounded Discrepancy Search 

Depth-bounded Discrepancy Search (DDS) developed in [21], is an improved 
LDS that prioritizes discrepancies at the top of the tree to correct early mistakes 



first. This assumption is ensured by means of an iteratively increasing bound 
on tfie tree depth.. Discrepancies below this bound are prohibited. DDS starts 
from an initial solution. At ith iteration, it explores those solutions on which 
discrepancies occur at a depth not greater than i. 

3.3 Climbing Discrepancy Search 

Climbing Discrepancy Search (CDS) is a local search method adapted to com- 
binatorial optimization problems proposed in |13| . CDS starts from an initial 
solution that would be dynamically updated. Indeed, it visits branches progres- 
sively until a better solution is reached. Then, the initial solution is updated and 
the exploration process is restarted. 

4 Proposal: Climbing Depth-bounded Adjacent 
Discrepancy Search 

4.1 CDADS: Main Features 

To stick to the problem under consideration, we now consider an optimization 
context. We propose CDADS (Climbing Depth-bounded Adjacent Discrepancy 
Search) method, that is a combination of a depth-bounded discrepancy search 
and a climbing discrepancy search. We also assume that, if several discrepancies 
occur in the construction of a solution, these discrepancies are necessarily ad- 
jacent in the list of successive decisions. CDADS starts from an initial solution 
obtained by a given heuristic, and explores its neighborhood progressively, ac- 
cording to the depth-bounded discrepancy search strategy. Hence, a limit depth 
d is fixed. Discrepancies below this bound are prohibited. At the ith iteration, 
we allow i discrepancies above the limit level d. 

When considering solutions with more than one discrepancy, we require these 
discrepancies are achieved consecutively, that means a solution consists of dis- 
crepancies that happen one after the other. This assumption of adjacency con- 
siderably limits the search space. We also consider that the initial solution is 
generated by a 'good' heuristic. Thus, only the immediate neighborhood of a 
discrepancy may receive an additional discrepancy. We obtain a truncated DDS 
based on adjacent discrepancies, DADS (Depth-bounded Adjacent Discrepancy 
Search). This approach is illustrated by an example on a binary tree of depth 
3 (see Figure [T|). At the starting point, DADS visits the initial solution recom- 
mended by the heuristic. For convenience, we assume that left branches follow 
the heuristic. At first iteration, DADS visits leaf nodes at the depth limit with 
exactly one discrepancy. The first line shown under the branches reports the 
visit order of considered solution, while the second line illustrates the number of 
discrepancies made in each solution. The 2nd iteration allows to exploring more 
solutions with two discrepancies with respect to the adjacency assumption. In 
this representation, the maximum depth bound is taken to be 3. If now, we limit 



the depth to two levels, several branches would not be retained, namely the 
branches 4, 6, and 7 would not be visited by DADS. 



0th Iteration 1st Iteration 




2nd Iteration 3th Iteration 




Fig. 1. Depth-bounded Ajacent Discrepancy Search 

Going back to the optimization issue, CDADS merges the DADS strategy 
with a CDS exploration principle, that is the initial solution used by DADS is 
dynamically updated when a best solution is found, and the exploration process 
is restarted. 

4.2 Heuristics 

CDADS is strongly based on the quality of the initial solution. Thus, we carried 
out an experimental comparison between various priority rules presented in the 
literature [19] , [15] . We considered the most effective heuristics to multiprocessor 
task hybrid flow shop scheduling. The four selected rules are: 

— SPT (Shortest Processing Time), which ranks jobs according to the ascend- 
ing order of their processing times; 

— SPR (Shortest Processing Requirement), which ranks jobs according to the 
ascending order of their processing requirement; 

— the Energy rule, considering first the jobs with the smallest energy (where 
the energy of an operation j at a stage i is evaluated by pij x sizcij); and 

— NSPTXastStage (Normalized SPT applied at the last stage). For this 
latest rule, §erifoglu and Ulusoy [T5| propose to schedule jobs according to 
their ranking index {Rlj) defined by: 

max{p™fc} -pm] +1 
' ^ max|p„fe| + 1 ■ 



In TablelU the selected priority rules are ranked according to their percentage 
of best solutions found, that is, performance. 



Table 1. Heuristic selection 



Priority Rule 


Performance (%) 


NSPT_LastStag 


;e 27 


Energy 


25 


SPT 


17 


SPR 


14 



4.3 Schedule Generation Scheme 

Schedule generation schemes (SGSs) are widely used in solving preemptive prob- 
lems. We distinguish between serial SGS and parallel SGS. These two heuristics 
ensure task scheduling based on a given priority rule. Hence, tasks are selected 
one after the other and a start time is fixed for each one. 

Serial SGSs are introduced in [TT]. At each iteration, the first available task 
in C is selected, where C is the priority list recommended by the priority rule. 
The selected task is scheduled as soon as possible with respect to both resource 
constraints and precedence constraints. 

Parallel SGSs developed in [5] , suggest a chronological procedure in schedul- 
ing tasks. At each time t, a set Ct of tasks being scheduled is defined: this set 
contains unscheduled tasks that can be processed at t without breaking neither 
precedence constraints nor resource constraints. If we consider that t is the first 
time where Ct 7^ 0, the first task in the priority list ( belonging to Ct is performed 
at t. The same process is applied until all tasks are scheduled. The two schemes 
depicted above may appear similar. However, the schedule they generate are dif- 
ferent: a serial SGS provides an active schedule while a parallel SGS generates 
a non-delay schedule. 

In the scheduling theory, Sprecher et al. [5D] show that the set of active 
schedules includes at least one optimal solution. On the contrary, non-delay 
schedules may eliminate all optima. 

Concerning our method CDADS, we do not enumerate all possible solutions, 
so even serial SGSs may exclude all optimum solutions. Furthermore, in practice, 
parallel SGSs are known for their operational efficiency. Hence, we opt for the 
implementation of a parallel SGS which has been proved, moreover, to be more 
efficient in our experimental studies. 



4.4 Lower Bound 



For efficiency purpose, we join CDADS with an evaluation of lower bounds at 
each node. The proposed lower bound is based on lower bounds previously pre- 
sented in [M]. Thus, we suggest this formula: 

LB = max{LB'',LB^) 

where LB^ is a job-based lower bound similar to the one suggested in [14] : 

m 

LB^ = max( > Pij)] and LB* is a stage-based lower bound: LB'^— max LB{i). 

j^J ^ — ^ i—l..m 

i—1 

For this latter bound, we claim that: 

m 

max[Mi(i), M2(i), max(pij)] + min( pij) , 
ieJ ieJ ^ — ' 



LB{i) 



Vz = 1 



=i+l 



i-1 

niin(^p,j) 

i-l 



max[Mi(i), A'hii), max(pjj)] 



min( > pij) + max[Afi(i), M2(i), ma.x{pij)] 



where 



Mi(z) 



and 



with 



A, = {j\sizeij > ^\ 



and 



mm 

i6 



Vi = m 



Justification of the expression of LB(i). 

We assume that only non-delay task scheduling is considered. 

The first term of LB{i) gives a lower bound on the beginning of every 
job J G J on any machine of stage i. 

The last term can be explained accordingly, since it is associated with 
the minimal required time to achieve the processing of every job j on all 
the subsequent stages of stage i. 



The middle term concerns the processing of jobs on stage i. Mi{i) stands 
for the mean stage load for job preemptive scheduling, while M2{i) re- 
views two different situations for partitionning the jobs according to their 
resource requirement. Set Ai consists of jobs that must be processed se- 
quentially (resource requirement greater than the half of the resource 
capacity m^). Set Bi groups together the jobs having a resource require- 
ment exactly equal to the half of the resource capacity. Obviously, a job 
belonging to Ai and another job belonging to Bi must also be processed 
sequentially. The added term max(pij) contributes to maximize the eval- 

nation of stage load on a considered stage i, especially when some jobs 
having high processing time are being scheduled. 

This justifies the validity of the bound. □ 

5 Computational Study 

5.1 Test Beds 

For comparison purpose, we assess the performance of CDADS on instances of 

Oguz's benchmark available on her home page: http : //home .ku. edu.tr/ coguz/public_html/ 

This benchmark is widely used in the literature [TH|, [TU], 

The number of jobs is taken to be n = 5, 10, 20, 50, 100 and the number of 
stages m takes its value from the set {2,5,8}. The benchmark considers two 
types of problems, "Type-1" and "Type-2". In 'Type-1' instances, the number 
of processors mi available at each stage i (resource capacity) is randomly deter- 
mined from the set {1, . . . , 5}, while in 'Type-2' rrii is fixed to 5 processors for 
every stage i. In fact, 'Type-2' instances are globally more flexible than 'Type-1 
instances'. For each combination of n and m, and for each type, 10 instances are 
randomly generated, which leads a total of 300 instances. The processing time 
of each job j in stage i {pij) and its processing requirement (sizeij) are integer 
and are randomly generated from sets {1, . . . , 100} and {1, . . . , m^}, respectively. 

The algorithm implementing CDADS was coded in C-I--I- and run on an In- 
tel core 2 Duo 2 GHz PC. The maximum CPU time is set to 60 seconds. The 
exploration is also stopped when CDADS reaches a given lower bound on the 
makespan. Obviously, if CDADS misses the optimal solution, the best-found so- 
lution when the maximum CPU time is reached, is then taken to be the problem 
solution. 

5.2 Restart Policy 

For the computational study, we have then retained four priority rules to generate 
the initial solutions (see Section |4?2|) . That is why whe have introduced a restart 
policy to beneflt from these heuristics. At a starting point, we use the best 



rule, that is the NSPT_LastStage. However, if no improvement is noticed during 
the CDADS search, we restart the process with another solution obtained by 
applying the next rule "Energy" that could lead a more efficient solution for this 
specific instance, and so on. 

The restart policy is limited by the size of the heuristics pool: restarts are 
then allowed at most four times, since we have selected four rules. At each restart 
k (starting from fc = 0), we increase the number of maximum nodes that can be 
visited according to a geometrical series nhrNodes x/*"', where / is fixed to 1.3 
and nhrNodes varies linearly with the problem size (the number of jobs n; for 
example for n = 20 we fix nhrNodes to 2000 nodes). Hence the search space is 
expanded at each restart. 



5.3 Results 

We tested two strategies for applying discrepancy: Top First and Bottom First. 
In the Top First exploration, discrepancies at the top of the tree are privileged 
while the Bottom First strategy favors discrepancies at the bottom. Computa- 
tional study shows that CDADS is really more efficient with a Top First strategy 
(then contradicting - for the problem at hand - the statement of relative indif- 
ference of discrepancy order by [17]). Thus, the results shown below refer to this 
latter strategy. 

Table [2] gives for each configuration (n: number of jobs, and m: number of 
stages) and each type, the average percentage deviation {%dev) and the average 
CPU time. The average percentage deviation is measured in two ways: 



For small problems, solutions are compared to the optimal solutions (C* 
denotes the optimum makespan): 

C — c* 
"""" X 100: 



max 



For larger problems, solutions found by the CDADS are compared to the 
lower bound {LB): 

^■"^^ ~ X 100. 
LB 

As explained in Section [5^ CDADS is run four times on each of the selected 
priority rules (NSPT_LastStage, Energy, SPT, SPR) for each instance. The best 
solution is taken to be the CDADS solution for the corresponding problem. 
According to findings of [19], the Fm(mi,. . .,mm)|sizeij|Cmax problem and its 
symmetric have the same optimal makespan. Referring to this property, we apply 
a two-directional planning (forward schedule and backward schedule) . 

From Table [51 it is observed that the average percentage deviation is higher 
for 'Type-1' instances. Globally, %dev is 1.66% for 'Type-1' problems and 6.39% 
for 'Type-2' problems. This increase can be linked to several assumptions: the 
lower bound becomes less effective as rrii increases in 'Type-2' instances and so 



Table 2. CDADS performance 







'Type-1' Problems 


'Type-2' Problems 


71 


TTl 


%dev 


CPU(s) 


%dev 


CPU(s) 


5 


2 





< 0.1 





< 0.1 




5 


0.21 


< 0.1 


0.46 


< 0.1 




8 


1.71 


< 0.1 


0.5 


< 0.1 


10 


2 





< 0.1 


1.72 


< 0.1 




5 


0.66 


0.4 


6.44 


< 0.1 




8 


8.47 


< 0.1 


9.61 


0.2 


20 


2 


0.05 


0.1 


3.34 


3.1 




5 


2.57 


1.1 


7.97 


1.3 




8 


5.11 


0.2 


15 


1.3 


50 


2 


0.49 


2.3 


1.74 


4.2 




5 


0.54 


5 


8.2 


13.5 




8 


1.62 


6.8 


12.42 


33.4 


100 


2 


0.08 


11.1 


3.32 


22.8 




5 


1.5 


13.6 


10.75 


40.9 




8 


1.86 


11 


14.33 


47.3 



Global average 1.66 3.44 6.39 10.53 



the average percentage deviation would be higher. Another explanation can also 
be considered: the number of processors are fixed in 'Type-2' problems, that is 
rrij = 5, and the scheduling problem becomes more difficult to solve for CDADS. 

Results show the behavior of our approach with variations of n and m. For a 
given n, the average percentage deviation increases with increasing m. Indeed, 
the problem difficulty increases when m increases and the obtaincid solution is 
further away from the lower bound. On the other hand, for a givcm number 
of stages m, increasing n has no significant effect on the average percentage 
deviation, as the effectiveness of CDADS is independent of the number of jobs: 
the stability of our method seems to be not linked to the number of jobs n, since 
for a given m {e.g.^ m = 8), in 'Type-1' problems, when n increases from 50 
jobs to 100 jobs, the average percentage deviation increase slightly (from 1.62% 
to 1.86%). It also can be noticed, that in some cases, increasing n results in a 
decrease in the deviation value (for the configuration n = 20, m = 8 the %(lev is 
taken to be 5.11%, and is evaluated to 1.62% for n = 50,m = 8). Apparently, 
the lower bound becomes more effective with n increasing. 

Prom the experimental studies, it can be observed that CDADS converges 
quickly. The average CPU time varies between less than 0.1 seconds and 47.3 
seconds. The computational cost is more important in 'Type-2' instances, con- 
firming the difficulty of these problems. Similarly, for a fixed m, increasing n 



leads to CPU time increase. Conversely, when n is fixed, increasing m increases 
the CPU time. 

5.4 Comparison of CDADS Solutions with State-Of-the-Art Results 

Table[3]presents the results of CDADS on %dev, the average percentage deviation 
(as well as a synthesis of the average CPU time for all instances, in the last line 
of the table). Furthermore, it shows the results obtained by Jouglet et al. in [TU] . 
These results are the most recent and the best-known solutions in literature. 
Thus, we have compared the results of CDADS with GA (genetic algorithm), 
CP (constraint programming), and MA (a memetic algorithm that combines 
GA and CP). We disregard the results published by Ercan et al. [Tl] given 
inconsistency encountered. We contrast our results only versus those presented 
in |10j . However, we omit the average deviation published in this latest paper 
due to detected miscalculation (induced by Ercan et al.'s errors). Hence, we 
recalculated the average percentage deviation for all methods given in [10] . The 
maximum CPU time is fixed at 900 seconds for GA, CP, and MA. 

Table 3. Comparing average percentage deviation (and CPU time) 



'Type-1' Problems 'Type-2' Problems 



n 


m 


CDADS 


GA 


CP 


MA 


CDADS 


GA 


CP 


MA 


5 


2 





0.29 











1.23 










5 


0.21 


1.35 








0.46 


1.44 










8 


1.71 


4.15 








0.5 


2.38 








10 


2 














1.72 


2.83 


1.72 


1.75 




5 


0.66 


1.64 








6A4: 


7.8 


6.1 


5.67 




8 


8.47 


9.38 


10.32 


8.02 


9.61 


10.87 


8.37 


8.8 


20 


2 


0.05 


0.44 


2.59 


0.66 


3.34 


3.7 


6.72 


3.43 




5 


2.57 


3.49 


10.85 


2.78 


7.97 


9.57 


22.86 


9.57 




8 


5.11 


5.69 


17.98 


5.32 


15 


17.26 


28.52 


16.02 


50 


2 


0.49 


0.63 


2.79 


0.49 


1.74 


2.76 


6.54 


2.21 




5 


0.54 


0.59 


5.3 


0.51 


8.2 


10.95 


20.01 


10.32 




8 


1.62 


2.17 


14.42 


1.71 


12.42 


15.89 


30.06 


17.25 


100 


2 


0.08 


0.15 


1.96 


0.07 


3.32 


3.05 


5.68 


2.7 




5 


1.5 


2.5 


5.19 


2.33 


10.75 


14.95 


19.13 


14.37 




8 


1.86 


1.99 


9.47 


2.15 


14.33 


20.06 


23.15 


17.83 


Global 


average 


1.66 


2.27 


5.39 


1.6 


6.39 


7.28 


11.92 


8.32 



Average CPU(s) 3.44 879.93 320.3 326.01 10.53 879.08 423.09 511.27 



As revealed in Table [3] (and as already noticed in Table [2]), on the whole, the 
total average of %dev obtained by CDADS is 1.66% and 6.39% for the 'Type-1' 



and 'Type-2' problems, respectively. Compared to the corresponding averages 
of 2.27% and 7.28% achieved by GA, and the corresponding values of 5.39% 
and 11.92 % obtained by CP, CDADS outperforms the GA and CP algorithms. 
Furthermore, CDADS was clearly superior to CP especially for larger instances 
{n = 50 and n = 100). 

As depicted in the table, MA finds slightly better solutions in 'Type-1' prob- 
lems, that is 1.60% is obtained by MA while CDADS gives an average deviation 
percentage of 1.66%. Overall, CDADS outperforms significantly MA, as CDADS 
results are at 6.39% from optimal solutions (or lower bounds) for 'Type-2' prob- 
lems against 8.32% for MA. 

To further assess the effectiveness of CDADS, we measure the number of im- 
proved known solutions. It can be seen from Tabled that CDADS improves 75 
known solutions among the 300 tested instances. Thus, the rate of improvement 
reaches 25%. The results also outline that most improvements are spotted in 
large instances (n = 50,100), see figure [2l No significant improvements are no- 
ticed for small instances {n = 5, 10) since all optimal solutions for these problems 
are known. 



Table 4. Number of improved solutions 
n 'Type-1' Problems 'Type-2' Problems 



5 








10 


1 





20 


5 


10 


50 


8 


20 


100 


8 


23 



total 22 53 



In this study, we also compare the convergence of algorithms. It can be seen 
from the last line of Table [31 that CDADS outperforms the genetic algorithm 
(GA), constraint programming (CP), and the memetic algorithm (MA). Indeed, 
CDADS takes between less than 0.1 seconds (for small problems) and 47.3 sec- 
onds (for large problems) to find their solutions, while methods proposed in [lO] 
converge much more slower [0.7 sec, 900 sec]. Even all results were obtained 
under different computational budgets, we can conclude that CDADS demon- 
strates fast convergence. Indeed, according to Dongarra's normalized coefficients 
[7], our machine is approximately only 3.5 times faster than the machine used 
by Jouglet et al. 



3 




40 60 80 

Number of jobs 



100 



Fig. 2. Variation of the number of improved solutions with the number of jobs 



6 Conclusions 



In this paper, the hybrid flow shop problem with multiprocessor tasks is ad- 
dressed by means of a discrepancy search method. The proposed method, Climb- 
ing Depth-bounded Adjacent Discrepancy Search (CD ADS), is based on adjacent 
discrepancies. We selected several heuristics to generate the initial solution. A 
lower bound is also proposed to lead a more efficient search. Compared to the 
best-known results in the literature, CDADS provides better solutions in little 
CPU time. 

In the short-term, we prospect to apply CDADS to simpler problems like 
classical hybrid flow shop (sizeij — 1, widely studied in the literature. 

Another expected aim would be to adapt the proposed implementation of dis- 
crepancy search to more general scheduling problems, in particular the Resource- 
Constrained Project Scheduling Problem, which still remains one of the most 
challenging problems in large-scale scheduling. 
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